Standard Deviation versus Standard Error

Steven C. Howell

5 January 2017

References:

Brown, G. W. Standard deviation, standard error: Which ‘standard’ should we use? American Journal of Diseases of Children 136, 937–941 (1982).
Cumming, G., Fidler, F. & Vaux, D. L. Error bars in experimental biology. The Journal of Cell Biology 177, 7–11 (2007).
Biau, D. J. In Brief: Standard Deviation and Standard Error. Clin Orthop Relat Res 469, 2661–2664 (2011).

Representing a Distribution

What is the best way to represent or summarize a group of measurements or counts?

central value

mode: the most frequent value
median: the value midway between the lowest and highest value
mean ($\mu$): the average of all values

dispersion

range: lowest and highest value
quartile range: range for percentiles (ideal for asymmetric distributions)
variance: average of the squared distances from the mean
standard deviation ($\sigma$): square root of the variance (also referred to as the "root-mean-square"). Typical or (roughly speaking) average difference between the data points and their mean.

Standard Deviation (descriptive statistic)

Mean of a population: $\quad \mu = \frac{\sum_i^N x_i}{N}$

Variance of a population: $\quad \sigma^2 = \frac{\sum_i^N{(x_i - \mu)^2}}{N}$

Standard deviation of a population: $\quad \sigma = \sqrt{\frac{\sum_i^N{(x_i - \mu)^2}}{N}}$

Number in Population: $\quad N$

For gaussion distributions, the mean and standard deviation completely describe the distribution.

What about non-gaussian distributions?

A frequent pitfall occurs when reporting a mean and standard deviation for a non-gaussian distribution. When a distribution is not gaussian, the standard deviation can still be calculated and reported, but the meaning is easily misinterpreted.

Consider the example from reference [1] in which certain 1976 medical articles had an average of 4.9 authors $\pm7.3$ (SD). As face value, this indicates that 95% of articles had $4.9\pm(1.96\times7.3)$ authors, or from $-9.4$ to $+19.2$; or that 25% of articles had zero or fewer authors.

In this situation, a better description would be provided using the mean and range, or quartile ranges.

Absolute versus Estimate Statistics for a Population

The mean and standard deviation are used to describe the distribution of a population of measurements and to estimate the distribution a population based on a sample of measurements.

This difference means there are two slight changes in the calculation of the standard deviation for the second case. First instead of the mean being the exact population mean, $\mu$, the mean is an estimate of the larger population mean, $\bar{x}$, though it is calculated the same way. The second difference, which actually changes the result, is that the sum of the squared differences from the mean is divided by $n-1$ rather than $N$. This change produces a wider distribution, corresponding to the sample underestimating the full spread of values present in the entire population.

Estimate of the population mean: $\quad \bar{x} = \frac{\sum_i^n x_i}{n}$

Estimate of the population standard deviation from a sample: Typical or average difference between the data points and their mean. $$\quad SD = \sqrt{\frac{\sum_i^n{(x_i - \bar{x})^2}}{n-1}}$$

Number in Sample: $\quad n$

While there is only a slight difference in calculating these two standard deviations, there is a real difference in their meaning. Unfortunately it is rare to see a distinction made when reporting a standard deviation value.

Standard Error (inferential statistic)

The standard error provides a prediction of the confidence interval for which the true value should be; it does not describe the distribution of values (it also has nothing to do with standards or errors). Whenever used, it should be described what statistic it corresponds to, e.g., standard error of the mean (SEM).

Standard Error of the Mean, SEM

Standard error of the mean: A measure of hov variable the mean will be, if you repeat the whole study many times. $$\quad SEM = SD_{\bar{x}} = \frac{SD}{\sqrt{n}} = \sqrt{\frac{\sum_i^n{(x_i - \bar{x})^2}}{n\, (n-1)}}$$

The SEM can be use to declare something like the following: "The mean of the sample was 73 mg/dL, with an SE of the mean of 73 mg/dL. This implies that the mean of the population from which the sample was randomly taken will fall, with 95% probability, in the interval of $73\pm(1.96*3) mg/dL, which is from 67.12 to 78.88 mg/dL." (from reference [1])

The above formula for the $SEM$ can be rearranged for more efficient computation as follows. $$ SEM^2 = \frac{\sum_i^n{(x_i-\bar{x})^2}}{n (n-1)}$$

$$ SEM^2 = \frac{\sum_i^n{(x_i^2 - 2x_i\bar{x} + (\bar{x})^2)}}{n (n-1)}$$$$ SEM^2 = \frac{1}{n-1} \left[\frac{\sum_i^n x_i^2}{n} - 2\,\bar{x}\,\frac{\sum_i^n x_i}{n} + (\bar{x})^2\,\frac{\sum_i^n 1}{n}\right]$$

Note that $$\frac{\sum_i^n x_i}{n} = \bar{n}\,,$$ and $$\frac{\sum_i^n 1}{n} = 1\,.$$ Making these replacements, we are left with, $$ SEM^2 = \frac{1}{n-1} \left[\frac{\sum_i^n x_i^2}{n} - 2(\bar{x})^2 + (\bar{x})^2\right]\,,$$ or simply, $$ SEM^2 = \frac{1}{n-1} \left[\frac{\sum_i^n x_i^2}{n} - (\bar{x})^2\right]\,.$$

Written with angled braces to represent averages this is simply, $$ SEM^2 = \frac{\langle x^2 \rangle - \langle x \rangle^2}{n-1}\,.$$

This form allows for calculating the variance (part in square braces) in two passes of the numerical series, one summing $x^2$ (first term in square braces), and another summaing $x$ to get the mean.

Standard Error of Proportion, SEp

The standard error of proportion is used when describing the proportion of group that have a certain classification. An example from reference [1] is when six of ten patients with zymurgy exhibit so-and-so. The natural interpretation is that we should expect to see so-and-so for 60% of patients with zymurgy.

The standard error of proportion provide an estimate for confidence intervals in this situation.

Standard error of proportion: $\quad SE_p = \sqrt{\frac{p (1-p)}{n}} $

Proportion estimated from sample: $\quad p$



In [7]:

    
def calc_sep(p, n):
    return np.sqrt(p * (1-p)/n)

p = 0.6

Example with 10 patients



In [16]:

    
n=10
sep = calc_sep(p, n)
ci95 = 1.96 * sep
interval = p - ci95, p + ci95
print('sep : {:0.3}\n95% ci for n={}: {:0.3} to {:0.3}'.format(sep, n, interval[0], interval[1]))









    



sep : 0.155
95% ci for n=10: 0.296 to 0.904

So when 6 of 10 patients have a certain classification, the 95% confidence interval for predicting the classification of the entire population is $60\% \pm (1.96\times 0.155)$ or from $29.6\%$ to $90.4\%$.

Example with large patients populations



In [17]:

    
n=100
sep = calc_sep(p, n)
ci95 = 1.96 * sep
interval = p - ci95, p + ci95
print('sep : {:0.3}\n95% ci for n={}: {:0.3} to {:0.3}'.format(sep, n, interval[0], interval[1]))









    



sep : 0.049
95% ci for n=100: 0.504 to 0.696

Increasing the population size to 100 reduces the 95% confidence interval that the population exhibit so-and-so to be between $50.4\%$ to $69.6\%$.



In [18]:

    
n=1000
sep = calc_sep(p, n)
ci95 = 1.96 * sep
interval = p - ci95, p + ci95
print('sep : {:0.3}\n95% ci for n={}: {:0.3} to {:0.3}'.format(sep, n, interval[0], interval[1]))









    



sep : 0.0155
95% ci for n=1000: 0.57 to 0.63

Increasing the population size to 1000 reduces the 95% confidence interval that the population exhibit so-and-so to be between $57\%$ to $63\%$.

Notes

SEM is most appropriate for comparing two different curves to conclude similarity or difference (how likely are the differences due to random noise?)
In small-angle scattering (SAS), the error reported relates to the fidelity with which repeated measurement were performed, either from repeated scans or from measurements of different pixels at the same $q$-vector.
"Whenever you see a figure with very small error bars (such as Fig. 3), you should ask yourself whether the very small variation implied by the error bars is due to analysis of replicates rather than independent samples. If so, the bars are useless for making the inference you are considering."

Rules for effective use and interpretation of error bars (reference [2])

when showing error bars, always describe in the figure legends what they are
the value of $n$ (i.e., the sample size, or the number of independently performed experiments) must be stated in the figure legend.
error bars and statistics should only be shown for independently repeated experiments, and never for replicates. If a “representative” experiment is shown, it should not have error bars or P values, because in such an experiment, n = 1 (Fig. 3 shows what not to do).
because experimental biologists are usually trying to compare experimental results with controls, it is usually appropriate to show inferential error bars, such as SE or CI, rather than SD. However, if n is very small (for example n = 3), rather than showing error bars and statistics, it is better to simply plot the individual data points.
95% CIs capture $\mu$ on 95% of occasions, so you can be 95% confident your interval includes $\mu$. SE bars can be doubled in width to get the approximate 95% CI, provided n is 10 or more. If n = 3, SE bars must be multiplied by 4 to get the approximate 95% CI.
when n = 3, and double the SE bars don’t overlap, P < 0.05, and if double the SE bars just touch, P is close to 0.05. If n is 10 or more, a gap of SE indicates P ≈ 0.05 and a gap of 2 SE indicates P ≈ 0.01.
with 95% CIs and n=3, overlap of one full arm indicates $P\approx 0.05$, and overlap of half an arm indicates $P\approx 0.01$.
in the case of repeated measurements on the same group (e.g., of animals, individuals, cultures, or reactions), CIs or SE bars are irrelevant to comparisons within the same group.

Standard Error or Standard Deviation

If one wishes to provide a description of the sample, then the standard deviations of the relevant parameters are of interest. For instance we would provide the mean age of the patients and standard deviation, the mean size of tumors and standard deviation, etc.

If, on the other hand, one wishes to have the precision of the sample value as it relates to that of the true value in the population, then it is the standard error that should be reported. For instance, when reporting the survival probability of a sample we should provide the standard error together with this estimated probability. However, because the confidence interval is more useful and readable than the standard error, it can be provided instead as it avoids having the readers do the math.

How does this apply to Small-Angle Scattering (SAS)? What error is reported, and where does it originate?

Pauw, B. R. Everything SAXS: small-angle scattering pattern collection and correction. Journal of Physics: Condensed Matter 25, 383201 (2013).
Hura G, Sorenson J M, Glaeser R M and Head-Gordon T 2000 A J. Chem. Phys. 113 9140–9148
Ilavsky, J. Nika : software for two-dimensional data reduction. Journal of Applied Crystallography 45, 324–328 (2012).

Uncertainty in the scattering intensity should be calculated from two sources, the photon counting statistics, and the standard error of the mean of the pixel values. "The photon counting (Poisson) statistics defines the absolute minimum possible uncertainty in any counting procedure. It does not consider other contributors to noise such as the variance between pixel sensitivities or electronic noise" [1]. Additionally, we can set a lower limit on the error in scattering intensity to never have a relative uncertainty estimate lower than 1%, as beamlines report it is challenging to be more accurate than this [2].

$$ \sigma_{\!_{Q_\text{bin}}} \!= \max\left\{{ \begin{array}{l l l} \dfrac{1}{N_{Q_\text{bin}}}\sqrt{\displaystyle\sum_{Q_j\in\left[Q_k,\,Q_{k+1}\right]}\!\!\!\!\!\!\!\!\sigma_j^2} & {\text{photon counting error}}\\ \dfrac{1}{\sqrt{N_{Q_\text{bin}}}}\sqrt{\dfrac{\displaystyle\sum_{Q_j\in\left[Q_k,\,Q_{k+1}\right]}\!\!\!\!\!\!\!\!\left(I_j-I_{Q_\text{bin}}\right)^2}{N_{Q_\text{bin}}-1}} & \mbox{standard error of the mean}\\ \dfrac{I_{Q_\text{bin}}}{100} & \mbox{1% of } I_{Q_\text{bin}} \end{array}} \right\}\,, $$

This leaves the choice as to what the grid spacing should be. Typically users opt for either uniform, or logaritmically spaced $Q$-grids, the later providing more points at lower $Q$ values [3].

So in SAS, the error is the standard error of the mean. Least-squares fits should be weighted by the variance, but typically the number of $Q$-bins, $N_{Q_\text{bin}}$, is not reported.

$$ y = m x + b

Standard deviation in $Q$: $\sigma_Q$

Often, the standard deviation of the $Q$-values are also reported. Similar to the procedure for the scattering intensity, this can be converted to a standard error of the mean by dividing by $\sqrt{N_{Q_\text{bin}}}$,

$$ \sigma_{Q} = \dfrac{1}{\sqrt{N_{Q_\text{bin}}}} \sqrt{\dfrac{\displaystyle\sum_{Q_j\in\left[Q_k,\,Q_{k+1}\right]}\!\!\!\!\!\!\!\!\left(Q_j-\bar{Q}\right)^2}{N_{Q_\text{bin}}-1}} \,, $$

where

$$ \bar{Q} = \langle Q_j\in\left[Q_k,\,Q_{k+1}\right]\rangle\,. $$

I have never seen the $Q$ standard deviation or standard error of the mean factored into the Guinier fitting, or even plotted for that matter. What difference would this make? Should it be included?



In [ ]: